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1.  Introduction 


The  Computer  Vision  Laboratory  at  the  University  of  Maryland  has  been 
participating  in  DARPA’s  Strategic  Computing  Program  for  the  past  year. 
Specifically,  we  have  been  developing  a  computer  vision  system  for  autonomous 
ground  navigation  of  roads  and  road  networks.  Figure  1  contains  a  block 
diagram  of  the  system  as  it  is  currently  configured.  The  complete  system  runs  on 
a  VAX  11/785,  but  certain  parts  of  the  system  have  been  reimplemented  on  a 
VICOM  image  processing  system  for  experimentation  on  an  autonomous  vehicle 
built  for  the  Martin  Marietta  Corp.,  Aerospace  Division  in  Denver,  Colorado. 

An  early  description  of  the  system  was  presented  in  [l];  our  current  version 
is  described  in  [2,3]  We  give  a  brief  overview  here  of  the  principal  software  com¬ 
ponents  of  the  system,  and  then  describe  the  VICOM  implementation  in  detail  in 
the  body  of  this  paper. 

The  Vision  Executive  (see  Figure  l)  is  responsible  for  the  overall  coordina¬ 
tion  of  the  system.  It  represents  a  centralized  source  of  control  that  is  responsi¬ 
ble  for  scheduling  the  activities  of  all  of  the  vision  and  reasoning  processes  in  the 
system.  It  is  currently  implemented  in  C  on  the  VAX,  but  is  being  redesigned 
and  reimplemented  in  FLAVORS]*!]  to  run  on  a  SYMBOLICS  LISP  machine. 

The  Image  Processing  module  transforms  an  input  image  into  a  symbolic 
representation  of  the  boundaries  of  the  roads  in  the  field  of  view.  It  runs  in  one 
of  two  modes  -  a  bootstrap  mode  and  a  feed-forward  mode.  The  bootstrap  mode 
is  used  to  develop  an  initial  representation  of  the  road  on  which  the  vehicle  is  to 


■•'A 


2 


travel.  Since  we  assume,  at  this  point,  that  aside  from  map  information  the  vehi¬ 
cle  has  no  preconceptions  about  where  the  road  will  be  in  its  field  of  view  or  what 
the  detailed  structure  of  that  road  is  (e.g.,  single  lane,  with  or  without  shoulders, 
lane  markings,  etc.),  the  bootstrap  image  processing  performs  a  global  analysis  of 
the  image  to  identify  significant  global  linear  features.  These  linear  features  are 
grouped  into  elements  called  “pencils”  (convergent  lines  in  the  image  plane) 
which  are  the  units  reasoned  about  by  the  knowledge  base  and  the  geometry 
module. 

During  continuous  operation,  of  course,  the  system  has  fairly  specific  expec¬ 
tations  concerning  the  position  and  appearance  of  the  road.  These  expectations 
are  generated  by  the  Prediction  Module  which,  based  on  a  three  dimensional 
model  of  the  road  constructed  by  the  Geometry  and  Knowledge  based  Modules 
and  an  estimate  of  the  travel  between  consecutive  frames  obtained  from  an  iner¬ 
tial  navigation  system  (INS),  generates  a  prediction  of  where  the  boundaries  of 
the  road  will  appear  near  the  bottom  of  the  current  frame.  This  prediction  is 
used  to  constrain  the  analysis  of  the  image  processing  operators  in  the  so-called 
“feed-forward”  mode  of  operation.  Here,  based  on  the  prediction  of  where  the 
road  boundaries  will  appear,  the  Vision  Executive  identifies  small  windows  in  the 
image  that  will  contain  pieces  of  the  left  and  right  road  boundary  and  using  a 
tightly  constrained  analysis  (since  both  the  geometric  and  photometric  properties 
of  large  pieces  of  the  road  can  be  carried  forward  from  the  analysis  of  previous 
frames)  identifies  the  projections  of  the  road  boundaries  through  those  windows. 
Based  on  the  computed  locations  of  the  road  boundaries  through  those  windows. 


subsequent  windows  are  placed  and  the  road  is  tracked  through  the  image. 

It  is  this  set  of  algorithms  that  constitute  the  feed-forward  image  processing 
that  have  been  reimplemented  on  the  VICOM  image  processor.  The  VTCOM 
implementation  allowed  us  to  take  advantage  of  some  of  the  special  purpose 
hardware  that  the  VICOM  provides  for  image  processing,  but  at  the  same  time 
forced  us  to  seriously  consider  the  various  time/space  trade-offs  that  would  be 
solved  in  one  way  on  a  machine  such  as  the  VAX  (with  a  moderately  fast  instruc¬ 
tion  cycle  and  built-in  floating  point  operations),  but  in  quite  a  different  way  on  a 
Motorola  68000. 

Section  2  of  this  paper  contains  a  brief  description  of  the  VICOM  image  pro¬ 
cessor  as  it  is  configured  in  our  laboratory  and  at  the  Martin  Marietta  test  site. 
In  Section  3  we  present  a  detailed  description  of  the  feed-forward  algorithms  as 
they  are  implemented  on  the  68000.  Section  4  contains  the  results  of  applying 
those  algorithms  to  several  images  taken  from  the  test  vehicle  at  the  test  track  in 
Denver. 

2.  Vicom  Image-Processor  Configuration 

The  University  of  Maryland  and  Martin  Marietta  test  site  VICOM  image 
processors  are  standard  VICOM  VDP  configurations.  Each  can  be  functionally 
separated  into  a  standard  Motorola  68000  microcomputer  system  and  a  special- 
purpose  image  processor.  The  microcomputer  contains  up  to  1.5  megabytes 
(Mbytes)  of  main  memory,  with  a  combination  of  332-Mbyte  Winchester-style 
disk  and/or  25-  or  16-Mbyte  diskette  systems.  The  image  processor  contains  a 


three-channel  color  (RGB)  analog  Video  Input  Digitizer  (VTD),  twelve  512-by-512 
16-bit-pixel  image  memories,  a  three-channel  color  (RGB)  display  system,  and 
dedicated  image  point,  ensemble,  spatial,  and  morphological  processors.  The 
dedicated  processors  can  perform  12-bit-in/ 16- bit-out  single-point  lookup  opera¬ 
tions,  12-bit  ensemble  arithmetic  (addition,  subtraction,  multiplication,  or  logi¬ 
cal),  image-pair  or  image/constant  combination,  12-bit  3-by-3  pixel  spatial  convo¬ 
lution,  or  morphological  (binary-image)  operations  in  approximately  0.333  seconds 
per  frame.  VICOM  memory  pixels  are  directly  accessible  to  the  68000  micropro¬ 
cessor,  at  a  slight  cost  over  the  local  processor-memory  cycle  time.  The  labora¬ 
tory  (non-vehicle)  systems  are  supplied  with  a  VAX  host  interface  and  a  trackball 
and  mouse. 

Software  for  the  VICOM  was  written  using  VERSADOS  Pascal  and 
Motorola  68000  Assembler,  interfacing  to  the  VICOM  VDP  Image  Processor 
Applications  Library  and  Hardware  Driver  packages.  An  Advanced  Information 
and  Decision  Systems  package  was  used  for  transferring  information  between  the 
VAX  host  and  the  laboratory  VICOM. 

3.  Extracting  Linear  Features 

The  VAX  11/785  implementation  of  the  feed-forward  image  processing  steps 
are  described  in  [3].  The  VICOM  implementation  exploits  special-purpose  image 
processing  hardware,  involves  specific  hardware-feature  accommodation,  and 
varies  from  the  VAX  implementation  steps  where  time/space  tradeoffs  warrant. 


The  vehicle  camera  acquires  a  512-by-512,  8-bit/pixel,  three-channel  color 
(RGB)  image  via  the  VICOM  VID.  Only  a  single  color  band  (ordinarily  red)  is 
analyzed  by  the  feed-forward  algorithm.  The  input  image  is  converted  to 
VICOM’s  16-bit  two’s-complement  pixel  format  using  a  high-speed  (one-frame¬ 
time)  pipeline  operation.  The  image  is  then  subsampled  from  5l2-by-512  to  256- 
by-256  spatial  resolution  for  further  processing  (as  this  is  a  slow  microprocessor- 
performed  step,  it  can  be  integrated  into  later  steps  to  save  time). 

Image  edges  are  then  extracted  over  the  entire  image  using  the  Sobel  edge 
detector.  The  x-  and  y-derivative  images  are  each  obtained  in  one  pipeline- 
convolution  operation  frame-time.  To  save  processing  time,  an  image  whose  pix¬ 
els  contain  the  concatenation  of  the  six  most-significant  bits  of  the  x-  and  y- 
derivative  image  is  produced.  This  image  can  be  converted  to  the  gradient  mag¬ 
nitude  and  direction  images  in  two  pipelined  steps  using  specially-constructed 
lookup  tables.  The  concatenated  image  is  produced  in  three  pipelined  operations 
by  aligning  the  six  most  significant  bits  of  the  x-  and  y-gradient  images  (using 
two  specially-constructed  table  lookup  operations)  and  one  logical  OR  operation 
to  align  the  images.  The  precision  loss,  in  this  step,  is  five  bits  from  each  deriva¬ 
tive  image.  Finally,  the  gradient  magnitude  and  direction  are  each  computed 
using  a  single  table  lookup. 

The  initial  windows  covering  segments  of  the  left  and  right  boundaries  of  the 
road  are  chosen  based  on  projecting  the  current  3-D  road  model  onto  the  image 
plane  and  determining  where  the  road  boundaries  enter  the  image.  For  these 
windows  we  must  estimate  both  the  orientation  6  and  position  p  of  the  (assumed 


locally  straight)  projection  of  the  road  edges.  Since  9  is  constrained  somewhat  by 
the  prediction,  we  can  ignore  any  edge  points  in  the  window  whose  directions 
differ  significantly  from  the  predicted  value  of  9.  In  addition  to  "‘thresholding” 
the  edge  points  in  a  window  based  on  gradient  direction,  we  also  apply  a  conser¬ 
vative  threshold  on  the  gradient  magnitudes.  A  Hough  transform  is  then  com¬ 
puted  using  the  remaining  edge  points  to  estimate  both  9  and  p.  The  Hough 
transform  would  ordinarily  be  computed  using  the  following  simple  algorithm 

for  each  edge  point  (x  ,  y  ) 

For  9  =  #min,  #max,  A 9 

p  —  ["-Y  cos#  +  y  sin#  1  ( * ) 

H  (p  ,9)  =  H  (p  ,9)  +  \ 

where  H  is  the  Hough  transform  array.  On  the  VICOM,  however,  step  (*)  is 
very  expensive  even  if  the  values  of  the  cosine  and  sine  are  precomputed  and  all 
arithmetic  is  performed  using  fixed-point  operations.  Therefore,  we  replaced 
these  arithmetic  operations  by  a  microprocessor-performed  table  lookup  on  the 
VICOM,  assuming  a  fixed  maximum  window  size  (64  by  64)  and  A 9  =3°.  This 
table  is  240I\  bytes  and  can  reside  in  either  the  VICOM  program  or  image 
memory.  The  coordinates  of  the  element  in  H  having  maximal  value  determine 
the  projection  of  the  road  edge  through  the  window.  In  subsequent  windows,  the 
Hough  transform  is  simplified  by  constraining  the  lines  in  these  windows  to  con¬ 
nect  to  the  lines  in  the  immediately  preceding  windows  (the  road  continuity 
assumption)  -  e.g.,  an  endpoint  of  the  line  in  the  previous  window  becomes  the 


pivot ,  or  intercept,  of  the  line  in  the  current  window.  Furthermore,  we  constrain 
the  orientation  of  the  line  in  the  current  window  to  be  in  a  small  interval, 
[9m  ],  centered  about  the  orientation  of  the  line  in  the  previous  window. 

Since  the  pivot  point  is  fixed,  we  need  to  estimate  only  one  parameter  -  the 
direction,  9 ,  of  the  line  through  the  pivot  point.  Given  the  pivot  point  ( xp  ,yp  ) 
and  any  edge  point  point  (xv,  y  )  in  the  window,  the  Hough  parameter  0  is  sim¬ 
ply 

9  =  tan'1((yy  -  yp)  /  (xy  -  xp))  . 

The  values  of  9  can  be  stored  in  a  lookup  table.  The  two  lookup  parameters  Ay 
and  Ax,  given  by 


Ay  =  (yv  -  y  ) 
^  =  (\  -  *!) 


are  in  the  range  [-W ,  +  W\  where  W  is  the  maximum  length  of  a  window  side. 
For  a  64~by-64  maximum  window  size,  this  results  in  a  16K-byte  table. 

Notice  that  if  the  interval  [9m  ,9^}  is  small,  then  most  points  in  the  square 
window  cannot  possibly  lie  on  the  line  being  sought,  and  it  would  be  wasteful  to 
consider  those  points  at  all.  We  can  efficiently  enumerate  the  points  in  the  con¬ 
vex  region  corresponding  to  the  intersection  of  the  square  window  and  the 
“infinite”  cone  centered  at  ( xp  ,yp  )  and  bounded  by  lines  and  at  orientations 
9m  and  9\{  respectively  through  (xp  ,yp  )  using  a  Discrete  Differential  Analyzer 
(DDA)  [5]  to  enumerate  the  grid  points  on  /j  and  l2  starting  from  ( xp  ,yp  ).  The 
DDA  algorithm  takes  a  point  (jp  ,yp  )  and  an  angle  9  (rather  than  two  line  end¬ 
points  as  in  the  standard  DDA)  and  generates  points  on  the  line  having  unit 
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spacing  in  either  the  horizontal  or  vertical  directions.  For  horizontal  spacing,  the 
relationship  between  the  i  th  and  (t  -i  )st  points  generated  are: 


x,  =  x,_!  +  1 
y'i  =  y'i-\  +  tan# 

Vi  =  [V.  ] 

For  vertical  spacing,  we  have: 


Vi  =  Vi -i  +  1 


x',  =  x _j  +  cot  6 


The  choice  of  horizontal  or  vertical  spacing  is  determined  by  the  octant  in  which 
the  line  lies. 

Finally,  a  fixed  number  of  Hough  accumulator  peak  lines  are  selected,  and 
the  peak  line  closest  to  the  direction  of  the  line  in  the  previous  window  is  chosen 
as  the  peak  line.  To  handle  slightly  curving  roads,  the  line  was  cut  back  a  fixed 
fraction  above  the  pivot  point  (typically,  by  one- half). 

Window  placement  terminates  when  a  window  reaches  a  fixed  image-height 
fraction  (typically,  80%),  when  a  window  leaves  the  image,  or  when  the  left  and 
right  window  sequences  cross  (e.g.,  at  a  horizon  vanishing  point). 


Using  48-by-32  first  windows  and  32-by-32  subsequent  windows  on  Denver 
test-track  vehicle  images,  processing  typically  takes  6-7  seconds  of  microprocessor 
and  pipelined-image-processor  time,  including  approximately  2.5  seconds  for  the 
first  two  and  1.5  seconds  for  the  subsequent  windows’  processing  time. 

4.  Results 

Figure  2  illustrates  the  results  of  applying  these  algorithms  to  several  images 
taken  from  the  vehicle  at  the  Denver  test  track.  In  Figure  2a  we  show,  for  each 
window,  the  edge  points  (thresholded  on  the  basis  of  both  direction  and  magni¬ 
tude)  that  fall  within  the  “cone”  generated  by  the  DDA  algorithm.  Figure  2b 
shows  the  superposition  of  the  detected  road  edges  on  the  images  in  Figure  2a. 
Finally,  Figure  2c  contains  the  original  image,  along  with  the  windows  and 
located  road  boundaries. 

5.  Summary 

This  paper  has  described  the  image  processing  component  of  a  computer 
vision  system  for  autonomous  ground  navigation  of  simple  roads  that  identifies 
road  boundaries  in  small  image  windows  whose  positions  are  predicted  from  a 
three-dimensional  model  of  the  road  and  the  estimated  distance  traveled  during 
continuous  operation.  Based  on  the  computed  locations  of  the  road  boundaries 
through  those  windows,  subsequent  windows  are  placed  in  the  image  and  the 
road  tracked  through  the  image.  This  set  of  algorithms  has  been  implemented  on 
the  VICOM  image  processor  to  take  advantage  of  some  special  purpose  hardware 
for  image  processing,  but  resulted  in  some  trade-offs  (heavy  reliance  on  table 


lookup  operations)  due  to  the  nature  of  the  host  machine. 
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Figure  2.  These  arc  the  results  of  applying  these  algorithms  to  several  image 
taken  from  the  vehicle  at  the  Denver  test  track. 

Figure  2a.  The  edge  points  (thresholded  on  the  basis  of  both  direction  and  magni 
tude)  that  fall  within  the  ’‘cone"  generated  by  the  DDA  algorithm 


Figure  2c.  The  original  image,  along  with  the  windows  and  located  road  houn 
daries 


